Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators
نویسندگان
چکیده
Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments. MSC 2010 subject classifications: 62G08. ∗DH acknowledges support from NSF grants DMR-1534910 and IIS-1563785 and a Sloan Research Fellowship; LHD’s work was partially supported by NSF grants DMS-1208785 and DMS-1454817. 1022 Kernel ridge vs. principal component regression 1023
منابع مشابه
Kernel ridge vs. principal component regression: minimax bounds and adaptability of regularization operators
Regularization is an essential element of virtually all kernel methods for nonparametric regressionproblems. A critical factor in the effectiveness of a given kernel method is the type of regularizationthat is employed. This article compares and contrasts members from a general class of regularizationtechniques, which notably includes ridge regression and principal component reg...
متن کاملKernel methods and regularization techniques for nonparametric regression: Minimax optimality and adaptation
Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We f...
متن کاملDivide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
We study a decomposition-based scalable approach to kernel ridge regression, and show that it achieves minimax optimal convergence rates under relatively mild conditions. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset using a careful choice of the regularization ...
متن کاملEarly Stopping and Non-parametric Regression: An Optimal Data-dependent Stopping Rule
Early stopping is a form of regularization based on choosing when to stop running an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, a...
متن کاملOptimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms
We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017